Source coding, large deviations, and approximate pattern matching
نویسندگان
چکیده
In this review paper, we present a development of parts of rate-distortion theory and pattern-matching algorithms for lossy data compression, centered around a lossy version of the asymptotic equipartition property (AEP). This treatment closely parallels the corresponding development in lossless compression, a point of view that was advanced in an important paper of Wyner and Ziv in 1989. In the lossless case, we review how the AEP underlies the analysis of the Lempel–Ziv algorithm by viewing it as a random code and reducing it to the idealized Shannon code. This also provides information about the redundancy of the Lempel–Ziv algorithm and about the asymptotic behavior of several relevant quantities. In the lossy case, we give various versions of the statement of the generalized AEP and we outline the general methodology of its proof via large deviations. Its relationship with Barron and Orey’s generalized AEP is also discussed. The lossy AEP is applied to i) prove strengthened versions of Shannon’s direct sourcecoding theorem and universal coding theorems; ii) characterize the performance of “mismatched” codebooks in lossy data compression; iii) analyze the performance of pattern-matching algorithms for lossy compression (including Lempel–Ziv schemes); and iv) determine the first-order asymptotic of waiting times between stationary processes. A refinement to the lossy AEP is then presented, and it is used to i) prove second-order (direct and converse) lossy source-coding theorems, including universal coding theorems; ii) characterize which sources are quantitatively easier to compress; iii) determine the second-order asymptotic of waiting times between stationary processes; and iv) determine the precise asymptotic behavior of longest match-lengths between stationary processes. Finally, we discuss extensions of the above framework and results to random fields.
منابع مشابه
On Approximate Pattern Matching for a Class of Gibbs Random Fields
We prove an exponential approximation for the law of approximate occurrence of typical patterns for a class of Gibbsian sources on the lattice Z, d ≥ 2. From this result, we deduce a law of large numbers and a large deviation result for the the waiting time of distorted patterns. Key-words: Gibbs measures, approximate matching, exponential law, lossy data compression, law of large numbers, larg...
متن کاملA suboptimal lossy data compression based on approximate pattern matching
Wojciech Szpankowski§ Department of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. [email protected] A practical suboptimal (variable source coding) algorithm for lossy data compression is presented. This scheme is based on approximate string matching, and it naturally extends the lossless Lempel-Ziv data compression scheme. Among others we consider the typical length of appro...
متن کاملFrom coding theory to efficient pattern matching
We consider the classic problem of pattern matching with few mismatches in the presence of promiscuously matching wildcard symbols. Given a text t of length n and a pattern p of length m with optional wildcard symbols and a bound k, our algorithm finds all the alignments for which the pattern matches the text with Hamming distance at most k and also returns the location and identity of each mis...
متن کاملOn Approximate Pattern Matching for a Class of Gibbs Random Fields by Jean-rene Chazottes,
We prove an exponential approximation for the law of approximate occurrence of typical patterns for a class of Gibssian sources on the lattice Z d , d ≥ 2. From this result, we deduce a law of large numbers and a large deviation result for the waiting time of distorted patterns. 1. Introduction. In recent years there has been growing interest in a detailed probabilistic analysis of pattern matc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Information Theory
دوره 48 شماره
صفحات -
تاریخ انتشار 2002